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Abstract. We consider fragments of first-order logic and as models we allow finite 
and infinite words simultaneously. The only binary relations apart from equality 
are order comparison < and the successor predicate +1. We give characterizations 
of the fragments S2 = S2[<5+1] and FO^ = F0^[<,+1] in terms of algebraic and 
topological properties. To this end we introduce the factor topology over infinite 
words. It turns out that a language L is in FO^ PI S2 if and only if L is the interior 
of an FO^ language. Symmetrically, a language is in FO^ PI 112 if and only if it is the 
topological closure of an FO^ language. The fragment A2 = S2 H 112 contains exactly 
the clopen languages in FO^. In particular, over infinite words A2 is a strict subclass of 
FO^. Our characterizations yield decidability of the membership problem for all these 
fragments over finite and infinite words; and as a corollary we also obtain decidability 
for infinite words. Moreover, we give a new decidable algebraic characterization of 
dot-depth 3/2 over finite words. 

Decidability of dot-depth 3/2 over finite words was first shown by Glafier and 
Schmitz in STAGS 2000, and decidability of the membership problem for FO^ over 
infinite words was shown 1998 by Wilke in his habilitation thesis whereas decidability 
of S2 over infinite words was not known before. 

Keywords: infinite words, regular languages, first-order logic, automata theory, semi- 
groups, topology 
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1 Introduction 

The dot-depth hierarchy of star-free languages Bn for n € N + {1/2, 1} over finite words has been 
introduced by Brzozowski and Cohen [5]. Later, the Straubing-Therien Cn hierarchy has been 
considered [22, 25] and a tight connection in terms of so-called wreath products was discovered 
[19, 23]. It is known that both hierarchies are strict [4] and that they have very natural closure 
properties [5, 18]. Effectively determining the level n of a language in the dot-depth hierarchy 
or the Straubing-Therien hierarchy is one of the most challenging open problems in automata 
theory. So far, the only decidable classes are Bn and for n G {1/2, 1, 3/2}, see e.g. [17] for an 
overview and [10] for level B^i2- 
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Thomas showed that there is a one-to-one correspondence between the quantifier alternation 
hierarchy of first-order logic and the dot-depth hierarchy [27]. This correspondence holds if one 
allows [<, min, max] as a signature (we always assume that we have equality and predicates 
for labels of positions; in order to simplify notation, these symbols are omitted here). The same 
correspondence between the Straubing-Therien hierarchy and the quantifier alternation hierarchy 
holds, if we restrict the signature to [<], cf. [18]. In particular, all decidability results for the 
dot-depth hierarchy and the Straubing-Therien hierarchy yield decidability of the membership 
problem for the respective levels of the quantifier alternation hierarchy. 

The intersection A2[<] = Il2[<]nn2[<] of the language classes S2[<] and 112 [<] of the quantifier 
alternation hierarchy over finite words has a huge number of different characterizations, see [24] 
for an overview. One of them turns out to be the first-order fragment F0^[<] where one can 
use (and reuse) only two variables [26]. The fragment F0^[<] is a natural restriction since 
three variables are already sufficient to express any first-order language over finite and infinite 
words [11]. Using the wreath product principle [23, 30], one can extend A2[<] = F0^[<] to 
A2[<,+1] = F0'^[<,+1], see e.g. [14]. Decidability of F0^[<] follows from the decidability of 
S2[<]) but there is also a more direct effective characterization: A language over finite words 
is definable in F0^[<] if and only if its syntactic monoid is in the variety DA, and the latter 
property is decidable. The wreath product principle yields DA*D as an algebraic characterization 
of F0^[<, -|-1], but this does not immediately help with decidability. Almeida [1] has shown that 
DA * D = LDA. Now, since LDA is decidable, membership in F0^[<,-|-1] is decidable. Note 
that min and max do not yield additional expressive power for A2[<] and F0^[<]. 

Some of the characterizations and decidability results for the quantifier alternation hierarchy 
and for F0^[<] have been extended to infinite words. Decidability of Si[<] and its Boolean 
closure BSi[<] over infinite words is due to Perrin and Pin [15]; decidability of 5^2 [<] over 
infinite words was shown by Bojahczyk [3]. The fragments A2[<] and F0^[<] do not coincide for 
infinite words. In particular, decidability of F0^[<] does not follow from the respective result for 
A2[<]. Decidability of F0^[<] over infinite words was first shown by Wilke [31]. 

Over infinite words, using a conjunction of algebraic and topological properties yields further 
effective characterizations of the fragments S2[<] and F0^[<], cf. [7]. The key ingredient is the 
alphabetic topology which is a refinement of the usual Cantor topology. In addition, languages 
in F0^[<] n S2[<] can be characterized using topological notions; namely, a language L over 
infinite words is in F0^[<] PI S2[<] if and only if L is the interior of a language in F0^[<] with 
respect to the alphabetic topology. By complementation, a language is in F02[<] nn2[<] ff and 
only if it is the topological closure of a language in F0^[<]. This shows that topology reveals 
natural properties of first-order fragments over infinite words. In this paper, we continue this 
line of work. 

Outline We combine algebraic and topological properties in order to give effective characteri- 
zations of S2[<, +1] (Theorem 3.1) and F0^[<, +1] (Theorem 4.1) over finite and infinite words. 
The key ingredient is a generalization of the alphabetic topology which we call the factor topology. 
As a byproduct, we give a new effective characterization of S2[<,+1] over finite words (Theo- 
rem 3.2), i.e., of the level 3/2 of the dot-depth hierarchy. Dually, we get a characterization of 
112 [<, +1] over infinite words (Theorem 3.4). Moreover, we also obtain decidability results for the 
respective fragments over infinite words (in contrast to finite and infinite words simultaneously; 
Corollary 3.3 and Corollary 4.2). Concerning the intersection of fragments, we show that L is in 
F0^[<, -|-1] n S2[<, +1] if and only if L is the interior of a language in F0^[<, -|-1] with respect 
to the factor topology (Theorem 6.1) and dually, L is in F0^[<,-|-1] n n2[<,-|-l] if and only 
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if L is the topological closure of a language in F0^[<,+1] with respect to the factor topology 
(Theorem 6.2). Finally, we show that A2[<,+1] is a strict subclass of F0'^[<,+1] and that a 
language L is in A2[<,+1] if and only if L is in F0^[<,+1] and clopen in the factor topology 
(Theorem 5.1). 

2 Preliminaries 

Words Throughout, F is a finite alphabet and unless stated otherwise u, v, w are finite words, 
and a, /3, 7 are finite or infinite words over the alphabet F. The set of all finite words is F* and 
the set of all infinite words is F'^. The empty word is denoted by 1. We write F°° for the set of all 
finite and infinite words F* U F^. As usual, F"*" is the set of all non-empty finite words F* \ {1}. 
If L is a subset of a monoid, then L* is the submonoid generated by L. For L C F* we let 
L'^ = {uiU2 • • • I Uj G L for ah i > 1} be the set of infinite products. We also let L°° = L* \J . 
The infinite product of the empty word is empty, i.e., we have = 1. Thus, L°° = if and 
only if 1 G L. The length of a word w G F* is denoted by \w\. We write F'^ for all words of length 
k and F-'^ is the set of finite words of length at least k; similarly, F^'^ consist of all words of 
length less than k. The prefix of length A; of a word w is denoted by firsts (ti;); it is undefined if w 
is shorter than k. Symmetrically, lastfc(i(;) is the suffix of w of length k. By alph^(a) we denote 
the factors of length k of a, i.e., 

alph;,(a) = jiz; G F^ a = vwP for some u G F*, /3 G F°°| . 

As a special case, we have that alph^(a) = alph(a) is the alphabet (also called content) of a. 
We write imfc(a) for those factors in alph^(a) which have infinitely many occurrences in a. The 
notation imfc(a) comes from "imaginary". 

Languages We introduce a non-standard composition o for sufficiently long words. Let k > 1. 
For li G F* and a G F°° define w oj^ a hy 



w Op. a 



vxj3 if there exists x (zT^ ^ such that w = vx and a = xfi. 



Furthermore w 1 = w and 1 a = a. In all other cases w oi \s undefined. Note that if 
u a is defined, then alph;j(ii o;, a) = alphj!j(ii) U alph;j(a). In particular, the operation does 
not introduce new factors of length k. For ^4 C F'^ we define 

A*'' = {wi oj, ■ ■ ■ OkWn\ n>Q, Wi e A] , 
A^^ = {wi OkW2 0k ■ ■ ■ \ Wi e A} , 

= {aGF°° I imfc(a) = .4}. 

If k is clear from the context, then we write w oa instead of w o;. a, we write A® instead of A*'' , 
we write A® instead of and we write A® instead of Note that F* = 0®. 

A k-f actor monomial is a language of the form 

P = ^® o m o • • • o ^® o o A%^ 

for Ui G F-'^ and Ai C F^. The degree of P is the length of the word ui---Us. A k-factor 
polynomial is a finite union of fc-factor monomials and of words of length less than k. A language 
L is a factor polynomial (resp. monomial) if there is a number k such that L is a /c-factor 
polynomial (resp. monomial). 
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Fragments of First-order Logic We think of words as labeled linear orders, and we write x < y, 
if position x comes before position y. Similarly, x = y + 1 means that x is the successor of y. A 
position X of a word a is an a-position, if the label of x in a is the letter a. 

We denote by FO the first-order logic over words. Atomic formulas in FO are T (for true), 
unary predicates A(x) = a for a G F, and binary predicates x < y and x = y+l for variables x and 
y. Variables range over positions in N and A(x) = a means that x is an a-position. Formulas may 
be composed using Boolean connectives as well as existential quantification 3x : ip and universal 
quantification Vx: (/? for ip G FO. The semantics is as usual. A sentence in FO is a formula 
without free variables. Let ip € FO be a sentence. We write a\= ip \i a models 99. The language 
defined by p> is L(<^) = {a G F°° | a \= 93}. 

The fragment S„[C] of FO for C C {<,+!} consists of all sentences in prenex normal form with 
n blocks of quantifiers starting with a block of existential quantifiers. In addition, only binary 
predicates in C are allowed. The fragment n„[C] consists of negations of formulas in S„[C]. We 
frequently identify first-order fragments with the classes of languages they define. For example, 
Aji[C] = n n„[C] is the class of all languages which are definable in both Sn[C] and n„[C]. 

Another important fragment is FO^[C]. It consists of all sentences using (and reusing) only two 
different names for the variables, say x and y, and where only binary predicates from C are 
allowed. Let be a fragment of first-order logic. We say that L is T -definable over some subset 
K C r°°, if there exists some formula ip ^ J- such that L = {a ^ K \ a |= 93}. We frequently use 
this notion for either K = T* or K = T'^ . 

Finite Monoids We repeat some basic notions and properties concerning finite monoids. For 
further details we refer to standard textbooks such as [16]. Let M be a finite monoid. For every 
such monoid there exists a number n > 1 such that a" = a^" for all a € M, i.e., a" is the unique 
idempotent power of a. The set of all idempotents of M is denoted by E{M). We say that M 
is aperiodic, if a" = a^~^^ for all a G M. If we consider a sequence (ai, . . . ,a\M\) of elements 
Oi S M, then there exist {!,... ,|M|} and idempotent elements e G MaiM and / € MajM 
such that oi • • • Oj e = oi • • • Oj in M and f aj ■ ■ ■ a\M\ = ' • • a|jv/| • 

An important tool in the study of finite monoids are Green's relations. At this point, we only 
introduce their ordered versions <ti, <c, and <j: 



An ordered monoid M is equipped with a partial order < which is compatible with multiplication, 
i.e., a <b and c < d implies ac < bd. We can always assume that M is ordered, since equality is 
a compatible partial order. 

The theory of first-order fragments over finite non-empty words is presented more concisely in 
the context of semigroups instead of monoids. In this paper however, we want to incorporate 
finite and infinite words in a uniform model, and our approach is heavily based on allowing words 
to be empty. In order to state "semigroup conditions" for monoids, we have to use surjective 
homomorphisms h : T* ^ M instead of monoids M only. 

Let /i : r* — 7> M be a surjective homomorphism and let e € M be an idempotent. The set 
consists of all products of the form xq/i • • • Xm-ifmXm with idempotents fi, ■ ■ ■ , fm S ^(r^) ^ M 



a<nb 
a<cb 
a<jb 



^ aM Q bM. 



^ Ma C Mb. 



^ MaM C MbM. 
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and elements xq, . . . , Xm S M satisfying the following three conditions 

e <7^ xofi, 

e <j fiXifi+i for all 1 < z < m - 1, 

If e /i(r+), then we set Pg = {!}• Note that in this case we necessarily have e = 1 in M. The 
notation is for paths in e. An idempotent e is said to be locally path-top with respect to h if 
ePeS < e. Symmetrically, it is locally path-bottom with respect to h if ePgC > e. If the underlying 
homomorphism is clear from the context, we omit the reference to it. The homomorphism h 
is locally path-top (resp. locally path-bottom) if all idempotents in M are locally path-top (resp. 
locally path-bottom). 

Lemma 2.1 Let h : T* ^ M be a surjective homomorphism onto a finite monoid M . It is 
decidable whether M is locally path-top. 

Proof: We give an algorithm computing P^. for a given idempotent e. We define a composition 
on triples T = E{M) x M x E{M) by {fi,xij2){f3,X2,h) = (/i, X1/2X2, /4) if /2 = /s- Else 
the composition is undefined. Compute the fixed point P of the equation P = P U PT^ with 
Te = {{fi,xi, f2) G T I /i,/2 € h{T~^), e <j /iXi/2} and initial value P = T^. This requires at 
most \Mf iterations. Then Pe is the set of all X0/1X/2X2 where (/i,x,/2) G P, e <7^ xq/i and 

e <£ f2X2- □ 

Let /i : r* ^ M be a surjective homomorphism and let n G N such that a" is idempotent for 
all a G M. Suppose that h is locally path-top. With e = a", xq = a, /i = a", and xi = 1, we 
obtain a"+^ = exo/ixie < e = a" and hence, 

a" = a^" < a^"-^ < • • • < a"+^ < a" 

showing that a" = a""'"-'^ for all a G M, i.e., M is aperiodic. 
The homomorphism /i : F* ^ M is in LDA if 

(eaebe)" eoe (eaebe)^ = {eaebe)^ 

for all idempotents e G /i(r"'") and for all a, 6 G M. With e = a" and & = 1, we see that a"^^ = a" 
for all a G M, i.e., M is aperiodic. If the reference to the homomorphism is clear from the context, 
then we say "M G "P" for some property V meaning that "/i G "P" . 

Recognizability A language L C T°° is regular if it is recognized be some extended Biichi 
automaton, see e.g. [6], or equivalently, if it is definable in monadic second order logic [29]. 
Below, we present a more algebraic framework for recognition of P C T°° . The syntactic preorder 
<L over r* is defined as follows. We let s <l t if for all u,v,w G T* we have the following two 
implications: 

utvw'^ G P ^ usvw'^ G P and u{tv)'^ G P ^ u(st;)'^ G P. (1) 

Remember that 1^ = 1. Two words s, t £ T* are syntactically equivalent, written as s =l t, if 
both s <L t and t <l s. This is a congruence and the congruence classes [s]l = {t £ T* \ s =l t} 
form the syntactic monoid Synt(P) of P. The preorder <l on words induces a partial order 
<L on congruence classes, and (Synt(P),<i) becomes an ordered monoid. It is a well-known 
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classical result that the syntactic monoid of a regular language L C r°° is finite, see e.g. [15, 28]. 
Moreover, in this case L can be written as a finite union of languages of the form [s\l \tf[ with 
s, t € r* and st =l s and =l t- 

Now, let h : T* —?■ M be any surjective homomorphism onto a finite ordered monoid M and 
let L C r°°. If the reference to h is clear from the context, then we denote by [s] the set of finite 
words h^^{s) for s G M. The following notations are used: 

• (s, e) € M X M is a linked pair, if se = s and = e. 

• h weakly recognizes L, if 

L = [j {[s][e]'^ I (s, e) is a linked pair and [s][e]'^ C L} . 

• h strongly recognizes L (or simply recognizes L), if 

L = J {[s][e]^ I {s, e) is a linked pair and [s][e\^ n L / 0} . 

• L is downward closed (on finite prefixes) for /i, if [s][e]'^ C L implies [t][e]'^ C L for all 
s, t, e G M where t < s. 

Using Ramsey's Theorem, one can show that for every word a € r°° there exists a linked pair 
(s,e) such that a G [slie]"^- On the other hand, two different languages of the form [s][e]'^ are 
not necessarily disjoint. Therefore, if L is weakly recognized by h, then there could exist some 
linked pair (s,e) such that [s][e]'^ and L are incomparable. If L is strongly recognized by h, then 
for every linked pair we have either [s][e]'^ C L or [s][e]^ fl L = 0. In particular, whenever L is 
strongly recognized by h, then T°° \L is also strongly recognized by h. Every regular language L 
is strongly recognized by its syntactic homomorphism /i^ : T* ^ Synt(L); s i-^ [s]l- Moreover, 
L is downward closed for hi- 

2.1 The factor topology 

Topological properties play a crucial role in this paper. Very often a combination of algebraic and 
topological properties yields a decidable characterization of the fragments. Moreover, topology 
can be used to describe the relation between the fragments. This section introduces the topology 
matching the fragments S2[<,+1] and n2[<,+l]. 

We define the k- factor topology by its basis. All sets of the form u o for u € T* and 
^ C r'^ are open. Therefore, singleton sets {u} for u G F* are open in the fc-factor topology since 
{u} = u o 0®. A language is said to be factor open (resp. factor closed) if there is a natural 
number k such that L is open (resp. closed) in the /c-factor topology. 

Proposition 2.2 Let L C T°° be a regular language. Then L is factor open if and only if L is 
open in the (2 |Synt(L)| ) -/actor topology. 

Proof: The implication from right to left is trivial. Let n > 1 be a natural number such that L is 
open in the n-factor topology and let k = 2 |Synt(L)|. The statement is trivially true for n < k. 
Let /i : r* — 7> Synt(L) be a syntatic homomorphism of L. It strongly recognizes L. 

In the following we shall construct for each a € L a /c-factor open environment around a which 
is contained in L. This is immediate if a is a finite word, so assume a G F'^. 

For every word x G F+ of length at most |Synt(L)|, we fix a word / G F+ of length at most 
|Synt(L)| such that h{xf) = h[x) and h[f) is idempotent, if such a word / exists. For every 
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word w € r'^' there is a factorization w = xqXix and /o,/i G r+ such that \xi\ < |Synt(L)|, 
h{xifi) = h{xi), and h{fi) is idempotent for i = 0, 1. Therefore, w' = xq/qXi/iX has the same 
image under h as w. We use for the /j's the fixed /'s from above. 

Let A = imfc(a) and let a^^^ be obtained from a = a^^^ by replacing infinitely many occurrences 
of each w (z A hy w' such that also infinitely many occurrences of each factor w € A remain 
unchanged. By construction, we find a common linked pair (s,e) for a and a^^\ i.e., a,a^^^ G 
[s][e]'^. Now, a (z L implies [s][e]'^ C L by strong recognition, and hence, q*-^^ G L. We iterate 
this procedure of pumping idempotents and we construct q(*+^) from a^*-* until at some point 
imfc(a*-*^^^) = imfc(a^*-*). Let a' = a*-*^ be the final iteration. We have a' G L. 

Let B = im„(a'). Since L is n-factor open, for every sufficiently large prefix u of a' we have 
a' G u o S®' C L. Let C = imfc(Q'). We have a' G n o C®' and we claim n o C®* C L. 

Let 13 G uoC®" and let /3 = uxiX2 ■ ■ ■ such that \xi\ < |Synt(L)| and for each Xj (except maybe 
for the last one, if j3 is finite) there exists fi G F"*" such that h{xifi) = h{xi) and h{fi) is idem- 
potent. Moreover, the /j's are in our fixed set of /'s from above. Consider /?' = uxi/f X2/2' ■ ■ ■ 
obtained from /3 by "pumping idempotents". By construction of a', we have 13' & u o C L 
since every factor /"xj+i/^^^ of j3' occurs infinitely often as a factor of a' . By strong recognition, 
we see that fi € L. Let u be long enough, such that when removing all pumped /j's we obtain a 
sufficiently large prefix u^^^ of a such that 11^*^^X1X2 ■ ■ ■ G L, i.e., a G o C7® C L. This shows 
that L is A;-factor open. □ 

Proposition 2.3 It is decidable whether a regular language L C T°° is factor open. 

Proof: Lemma 2.5 below shows that for a given k it is decidable whether L is open in the A;-factor 
topology. Proposition 2.2 gives a bound on k. □ 

Lemma 2.4 Let A be a Biichi automaton and let L C r°° 6e the language accepted by A. For 
any k > 1 a Biichi automaton accepting the k-factor interior of L is effectively computable. 

Proof: A word a G r°° is in the interior of L if and only if there exists an open set containing 
a which is itself contained in L. If a is a finite word this is always true, so assume a G V^. By 
a product automaton construction we may assume without loss that A always knows the last 
k — 1 symbols ai ■ ■ ■ ak~i from the input. Consider a state q of A. We test whether, starting 
from g, each word in ai ■ ■ ■ ak-i o A® has an accepting computation. This is possible because the 
inclusion problem for Biichi automata is decidable. 

Now, we modify the automaton as follows. During the computation we decide nondeterminis- 
tically whether the prefix u read so far is long enough and if so we guess a set of fc-factors ^ C F*^ 
which we want to allow in the future such that u o A^ is accepted (meaning that A has passed 
the preceding test for the current state). With this choice we change to a new component which 
accepts if and only if with each new symbol a we have oi . . . ak-ia G A. If we decide that the 
prefix is not yet long enough, we continue in the normal computation of the original automaton. 
All states of the original automaton are no longer final. Therefore, a word is accepted if and only 
if there is a /c-factor open subset containing the word which itself is contained in L. Thus the 
constructed automaton accepts the interior of L. □ 

Lemma 2.5 Let L C T°° be a regular language and k > 1 be a natural number. It is decidable 
whether L is open in the k-factor topology. 
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Proof: A language L is open if and only if it equals its interior. Using Lemma 2.4, one can 
construct an automaton for the interior of the language. Equivalence checking of the input 
automaton and the automaton for its interior is decidable, see e.g. [21]. □ 

3 The first-order fragment E2 

One of our main results is a decidable characterization of the fragment 5]2[<, +1] over finite and 
infinite words. It is a combination of a decidable algebraic and a decidable topological property. 
For finite words only, this yields a new decidable algebraic characterization for dot-depth 3/2, 
which in turn coincides with S2[<,+1] over finite words [27]. 

Theorem 3.1 Let L C T°° be a regular language. The following are equivalent: 

(1) L is S2[<, +l]-rfe/ina6Ze. 

(2) L is a factor polynomial. 

(3) L is factor open and there exists a surjective locally path-top homomorphism h : T* —?■ M 
which weakly recognizes L such that L is downward closed for h. 

(4) L is factor open and Synt(L) is locally path-top. 

The proof of the preceding theorem is given at the end of this section. Next, we give a 
counterpart of Theorem 3.1 for finite words, which in turn yields a new decidable characterization 
of dot-depth 3/2. The first decidable characterization was discovered by Glafier and Schmitz [9, 
10]. It is based on so-called forbidden patterns. Later, a decidable algebraic characterization was 
given by Pin and Weil [19]. 

Theorem 3.2 Let L C F* be a language. The following are equivalent over finite words: 

(1) L is Y!,2[<i+^- definable over finite words. 

(2) L is a factor polynomial. 

(3) Synt(L) is finite and locally path-top. 

Proof: The language F* of finite words is definable in S2[<] by stating that there is a position 
such that all other positions are smaller. Hence, if L = {w G T* \ w \= cp} for some if E S2[<, -|-1], 
then there also exists some if' G S2[<, +1] such that L = {q G F°° | a \= if'}. Using Theorem 3.1, 
this shows "1 2". Trivially, "2 =^ 3" follows from the same theorem. Finally, "3 1" uses the 
fact that every language over finite words is factor open. □ 

The equivalence of (1) and (2) in Theorem 3.2 was also shown by Glafier and Schmitz using dif- 
ferent techniques and with another formalism for defining factor polynomials [10]. As a corollary 
of Theorem 3.1 and Theorem 3.2 we obtain the following decidability results. 

Corollary 3.3 Let L be a regular language. 

(1) For L C F°° it is decidable, whether L is 'E2[<t +M-definable. 

(2) For L QT* it is decidable, whether L is Ti2[<, +1]- definable over finite words. 

(3) For L CT'^ it is decidable, whether L is T,2[<, +l]-definable over infinite words. 
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Proof: For "1" we note that the syntactic monoid is effectively computable. Therefore, Theo- 
rem 3.1 (4) can be verified effectively by Lemma 2.1 and Proposition 2.3. Similarly, "2" follows 
from the decidability of Theorem 3.2 (3). The set of finite words T* is definable in S2[<; +1] over 
r°°. Hence, L C is S2[<, +l]-definable over if and only if L U T* is S2[<, +l]-definable 
over r°°, and the latter condition is decidable by "1". Therefore, assertion "3" holds. □ 

By duality, the properties of S2[<,+1] in Theorem 3.1 yield a decidable characterization of 
112 [<, +1]) which we state here for completeness. 

Theorem 3.4 Let L C T°° be a regular language. The following are equivalent: 

(1) L is Il2[<, +l]-definable. 

(2) L is factor closed and Synt(L) is locally path-bottom. 

Proof: The language L is factor closed if and only if r°° \ L is factor open and moreover, the 
syntactic preorders of L and its complement satisfy s <l t if and only if t <r°°\L s. Hence the 
claim follows by the equivalence of (1) and (4) in Theorem 3.1, since L € n2[<,+l] if and only 

if r°°\L e S2[<,+i]. n 

In the remainder of this section, we now prove the respective steps required for Theorem 3.1. 
Lemma 3.5 Let L C r°° be defined by if G ^2[<i +1] (md let 

(p = 3xi. . .3xfcVyi . . . Vyfci ^(xi,. . .,Xk,yi, ■ ■ .,yk)- 
Then L is open in the k-factor topology. 

Proof: Let a \= (p. We construct a /c-open environment of a contained in L. Let xi,...,Xk 
be such that ip{xi, . . . , Xk,yi, . . . , yk) is true on a for all yi, . . . , y^. Choose a prefix u of a and 
Act'' such that a G u o n A® and Xi + k < \u\ for ah i. We claim u o A® C L. Suppose 
/3 € uoA^ and [3 tp. This implies /3 Y= "ipixi, . . . ,Xk,yi, ■ ■ ■ ,yk) for the positions Xi from above 
and for some positions y^. 

Consider Y := {yi, . . . , yi} C {yi, . . . , y^} with i maximal such that ^j+i = yi ior 1 < i < i, 
i.e., a maximal factor covered by the positions yi. Take the Y such that minK is minimal. First 
consider the case yi < max{xj | 1 < i < k}. Since i < k we see that all positions yi stay in the 
prefix u and we can use the same positions in a. If yi > maxjxj | 1 < i < k}. Since all factors of 
length k appear infinitely often and £ < k, we see that we find the factor /3([yi; ye]) in a and we 
may choose this factor in such a way that yi is greater than the positions of all variables already 
set in a. Hence we can set the variables corresponding to those in Y to the respective positions 
of this factor. By induction on the number of such sets Y, we get a distribution of the yi in a 
with the same label as the yi in /3 and such that the same relations with respect to the order 
and successor predicate hold. Hence this distribution makes ip{xi, . . . ,yk) false on a, which is a 
contradiction. □ 

Lemma 3.6 If L Q T°° is T,2[<, +l]-definable, then Synt(L) is locally path-top. 
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Proof: Let L = L{ip) with ip = 3xi . . . Bx^Vyi . . . Vy^ : i^{xi-, • • • , Vk)- Consider an idempotent e 
of Synt(L) and p € Pe- We want to show that epe <l e. 

Consider xq,... ,Xm G T*, /i, . . . , € r+ such that /j ee^ Jf iox I < i < m, e <tz xq/i, 
e <c fmXm and e <^ fiXifi+i in Synt(L) for 1 < i < m - 1. Let fi = and 

■ Hin—lfrn—lXm—lfm ' UmfrnXm &ud 

p = Xo/lXi • • • fraXm- 

By these properties and idempotency of e, we see that there exist yi,...,ym € L* such that 
e =L e. Moreover, every p £ P^ has such a representation, i.e., p =l P- Note that > A; and 
thus no factor of length k can cover fi in total. Let 

We view positions of /3 as a subset of the positions of a by omitting those positions of a originating 
from the word p. Assume 13 \= ip, and let Xi be such that ^'(^^i) • • • ) Uk) is true on /3 for all yj. We 
claim that on a there is an assignment x'^ such that tp{x'i, . . . , y^) holds for all y'j. 

We construct the assignment x'^ by the following process. For all variables Xi lying in u, v or 
tL)'^ we set x'i = Xi. Assume without restriction that the remaining variables are xi < ■ ■ ■ < x^. 
Let Xij = {xi, . . . ,Xj} and write x <C y whenever y — x > {k + 1) ■ \e\ (intuitively this means 
that y and x are "far away" from each other). We start with Xi^£ and repeat the following until 
Xij is empty: 

• If not Xi <C Xi-i then we set then we set x'i so that x'^ — x'i_i = Xi — Xi^i and proceed with 
Xj+ij; else 

• if not Xj <^ Xj+i then we set then we set x'j so that and proceed 
with Xij^i; else 

• we have Xi <C Xi^i and Xj <^ Xj^i. In this case x[ is set to the position within e such that 
(k + 1) |e| < x[_-^ — Xi < {k + 2) [e|, i.e., between x'-_^ and the factor e appears k + 1 
times. Then we proceed with Xj+ij-. 

By construction, the variables x[ on a have the same label, relative order and successor relation- 
ship as the variables Xi have on /3: Although the variables may be placed in different factors e, 
the relative position within such an factor is the same for all corresponding variables. Now, one 
can show that for an assignment y^ such that a ^ i^ix'i, . . . ,y^) we find an assignment y^ such 
that f3 ^ ip{xi, . . . ,yk) contradicting the assumption. The basic idea is that, since the fi are 
long, all factors in p of length at most k also appear in e. Moreover, if a factor appears at least 
k + 1 times between two variables x ■ and x'j in a then the same holds true in /3 for the variables 
Xi and Xj. 

Similarly, one can show ui^e^^^'^'^^^v)'^ \= if implies u^e'^^'^^^^pe^^^'^^^v)'^ \= Lp. In total we get 
epe =L e^i^+^) pe^(^+^) <i (p^Hk+i) =^ g. Since this holds for all idempotents e and all p G Pe all 
idempotents of Synt(L) are locally path-top. □ 

The next lemma deals with the fragment S2[<; +1] over finite words, a special case which we 
will be needing for proving Theorem 3.1. An important tool in its proof are factorization forests. 
Let M be a finite monoid and let /i : L* — > M be a homomorphism. A factorization forest assigns 
to each word w G T-^ a factorization 

d{w) = {wi, . . . , Wn) with n > 2, w = wi ■ ■ ■ Wn and Wi G F"*" 
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such that n > 3 impHes h{w) = h(wi) = ■ ■ ■ = h{wn) is idempotent. The height t{w) of w is 
defined by t{a) = for leaves a G F and t{w) = 1+max {t{wi), . . . , t{wn)} if d^w) = {wi, . . . , Wn)- 
Simon's Factorization Forest Theorem [20] states that for every homomorphism /i : F* — )• M to a 
finite monoid M there exists a number tmax £ N and a factorization forest d such that t{w) < tmax 
for all w G F"*". In particular, imax does not depend on \w\. 

Lemma 3.7 Let h : T* —?■ M be a surjective homomorphism. onto a finite monoid with all 
idempotents being locally path-top. If h recognizes L C F*, then L is a {2 \M\) -factor polynomial. 

Proof: Let F^ = {w G F* | k < \w\ < 2k}. Now, every homomorphism h : T* M induces 
a homomorphism h^ : T^. —?■ M hj setting hk{w) = h{w) for all w G Ffc. If we apply the 
Factorization Forest Theorem to /i^, then we obtain a factorization forest for h : F-^ — > M of 
finite height, with leaves being factors of length between k and 2k, since every word in F-'^ can 
be factored into factors in Ffc. 

Let k = 2 \M\ and let d be a factorization forest of finite height with leaves in F^. By induction 
on the height t{w) of a word w of length at least k, we show that there exists a /c-factor monomial 
P{w) with degree depending only on t{w) and \M\ such that w G P{w) and for all u G P{w) we 
have h(u) < h{w). Moreover, each P{w) starts and ends with a word of length at least k (instead 
of starting and ending with a term of the form A®). 

For leaves w G F^ we set P{w) = w. If d{w) = (1^1,^2), then P{w) = P{wi) ■ P{w2) where the 
dot denotes the usual concatenation. This yields a fc-factor polynomial, since both P{wi) and 
P(w2) start and end with words of length at least k. Let now d{w) = {wi, . . . ,Wn) with n > 3 
and let e = h{w) = h{wi) be the corresponding idempotent. Let v = W2 - ■ ■ Wn~i be the product 
of the inner factors and let A = alph^(T;). If |f | < 2k, then we set P{w) = P{wi) ■ v ■ P{w2). 
Hence, we can assume v = sv't with s, t G F^. We set 



Obviously, w G P{w). Let u G P{'w) and write u = uisu'tUn with Uj G P{wi) and su't G so^®ot. 
We can factorize su't = xq - ■ ■ Xm such that < \xi\ < \M\ and for each 1 < i < m there exists 
fi G F"*" such that h{xi) = h{fiXi) and is idempotent. By construction of A, each word 

x^x^-j-i IS a factor of V and hence 



for each 1 < i < m. Moreover, by construction of s and t we see that xqXi is a prefix of W2 and 
that suffix of Wn-i- Together with e = h{w2) = h{'Wm-i), we obtain 



By assumption, e is locally path-top. Hence e h{xofiXi ■ ■ ■ fmXm) e < e in M. Putting everything 
together yields 



P{w) = P{wi) ■soA®ot- P{Wn). 



e <j h{xiXi+i) <j h{fiXifi+i) 



e <n h{xoXi) <n h{xofi) 



and 



e <C h{Xm) = h{fmXm). 



h(u) = h{ui) h{xo 
< h{wi) h{xQ 
= eh{xofiXi 



■Xm) h{Wn) 
■ fmXm) 6^6 



h{w). 
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In all cases of the induction, the degree of P{w) is bounded by S*^"') • k < 2^\^''\ where the last 
bound follows from the Factorization Forest Theorem for aperiodic monoids [13]. 

Let Li = {w & L \ \w\ < k} and let L2 = L\Li. Since L is recognized by h we see that 

L2 = U P{w) 



2 



and this union is finite since there are only finitely many /c-factor monomials of degree at most 
24|M|_ Therefore, L = Li U -L2 is a A;-factor polynomial. □ 



Lemma 3.8 Let L C r°° be a regular language. Let L he factor open and weakly recognized by 
a surjective locally path-top homomorphism h : T* ^ M onto a finite monoid such that L is 
downward closed on finite prefixes for h. Then L is a factor polynomial. 

Proof: Let L be n-open and let k = max {2 \M\ ,n}. Let a (z L. Since L is n-factor open, it is 
fc-open. Hence, there exists n € F* and A = imfc(a) with a G u o A® C L. Since /i : F* — > M is 
locally path-top, we know that the language P = {w € F* | h{v) < h{u)} over finite words is a k- 
factor polynomial by Lemma 3.7. Moreover, we may assume that the suffix of length k is explicit 
in all monomials of P. We define the factor polynomial Pq, = P o A® and show L = IJagL ^a- 
Since a & Pa is trivial, it remains to show Pa L for each a G L. 

Let V € P and (3 G A'® such that v o f3 is defined. We have u o (3 £ L. Consider a linked pair 
(s, e) with u o f3 £ [s][e]^ C L and a factorization n o /? = uw^ such that uw G [s] and 7 G [e]^. 
Let t = h{vw) then v o j3 = vwj G [t][e]'^. Moreover t < s and since L is downward closed we 
have [t][e]'^ C L. □ 



Lemma 3.9 Let L C F°° be a factor monomial. Then L is definable in S2[<,+1]. 

Proof: Let L = Af o ui o ■■■ o A® o Us o A® ^ for Ai Q T'', m £ T-''. Without restriction we 
assume \ui\ = k. Consider the formula 

3x1 . . . BxsVy : /\ A(xj) = Uj A Xj < Xj+i^ A f\ in. (2) 

l<i<s l<i<s+l 

The first conjunction states that for each i, Xi is the position of the marker Uj, and that the 
markers appear in the correct order. The formula ipi imposes the factor alphabetic restriction Ai 
between Ui-i and Ui. More precisely, ii is set to < y < Xj A(y) G Ai. In these formulas, 
we use the conventions xq = and x^+i = 00; the expression \{xi) = Ui (resp. \{y) G Ai) is an 
abbreviation saying that at position Xj the factor Ui begins (resp. at position y some factor in 
Ai begins). These abbreviations are readily replaced in such a way that the formula remains in 
S2[<,+1]. Therefore, L is defined by the S2[<, +l]-formula given in (2). □ 

We conclude this section with the proof of Theorem 3.1. 

Proof (Theorem 3.1): "1 =^ 4": This is Lemma 3.5 together with Lemma 3.6. 

"4 3" : Strong recognition implies weak recognition. The claim follows because the syntactic 
homomorphism Hl :T* ^ Synt(L) strongly recognizes L. 

"3 =^ 2": This follows from Lemma 3.8. 

"2 1" : Let L be a union of factor monomials and a (finite) set K of words of length less 
than k. By Lemma 3.9 each monomial is definable in S2[<, +1] and of course so is K. The result 
follows since S2[<,-|-1] is closed under union. □ 
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4 First-order logic with two variables 



In this section, we consider two-variable first-order logic with order and successor predicates 
[<, +1] over finite and infinite words. The fragment F0^[<, +1] admits a temporal logic counter- 
part having the same expressive power [8]. It is based on unary modalities only. Wilke [31] has 
shown that membership is decidable for F0^[<,+1]. We complement these results by giving 
a simple algebraic characterization of this fragment. An important concept in our proof is a 
refinement of the factor topology. A set of the form A® is definable in F0^[<,+1] but it is 
neither open nor closed in the factor topology. This observation leads to the strict k-factor 
topology. A basis of this topology is given by all sets of the form u o A® n A® for u € T* and 
A C r'^. We do not use this topology outside this section. Using the refined topology and the 
class LDA we can now state the following theorem. 

Theorem 4.1 Let L C T°° be a regular language. The following are equivalent: 

(1) L is F0'^[<, +l]-definable. 

(2) L is weakly recognized by some homomorphism h : T* ^ M LDA and closed in the strict 
(2 \M\) -factor topology. 

(3) Synt(L) G LDA. 

The proof of the above theorem can be found at the end of this section. The syntactic monoid 
of a regular language is effectively computable. Hence, one can verify whether property (3) in 
Theorem 4.1 holds. Since both T* and F"^ are F02[<,+l]-definable over F°°, this immediately 
gives us the following corollary. 

Corollary 4.2 Let L be a regular language. 

(1) For L C F°° it is decidable, whether L is F0'^[<, +l]-definable. 

(2) For L C F* it is decidable, whether L is FO'^ [<,+!]- definable over finite words. 

(3) For L CF'^ it is decidable, whether L is F0^[<, +l]-definable over infinite words. □ 

The following proposition relates monoids in LDA with monoids which are simultaneously 
locally path-top and locally path-bottom. It is a useful tool in the proof of Theorem 4.1. More- 
over, it immediately follows that A2[<,+1] is a subset of F0^[<,+1]. We will further explore 
the relation between these two fragments in the next section. 

Proposition 4.3 Let M be finite and let h : F* M be a homomorphism of monoids. The 
following are equivalent: 

(1) M G LDA. 

(2) ePgC = e for all idempotents e of M. 
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Proof: "1 2" Let n E N with o" = a for all a G M. First suppose m = 1. In particular, 
/i = /m and = Xm- Let e = xofib = c/ixi for some b,c G M. Then 

e = {xofibcfiXi)'^xofib 
= xo{fibcfiXiXofi)"-b 

= xo(/i6c/iXiXo/i)" fixixofi {fibcfiXiXQfiYb by /i € LDA 
= {xQfib cf iXiY xofixi {xofibcfiXiYxofib 
= exofixie. 

Let now m > 1 and let e = bfiXif2C for some b,c M. Set x'j^ = By induction we see 

that 

Set x'/ = X1/2 • • • Xm-ifmXme- From the case m = 1 we obtain 

e = exo/ix'/e = exo/ixi • • • fmXme. 

Note that indeed e <7^ and e <£ /ix'/. 

"2 =^ 1" Let e G /i(r+) C M be idempotent and x, y e M. Setting g = {exeyeY , xq = fi = 
e = /2 = X2, xi = X we see that X0/1X1/2X2 = exe S P^. Therefore, {exeyeY exe{exeyeY = 
gexeg = g = (exeyeY and hence M E LDA. □ 



Example 4.4 Let F = {a,b,c}. Consider the language Li = T*ab*aT°° consisting of all words 
such that there are two a's that only contain 6's in between. It is easy to see that Li is S2[<]- 
definable. Next, we will show that Li is not F0^[<, +l]-definable. Choose n G N such that s" is 
idempotent for every s € Synt(Li). Then 

{b'^ab'^cb'^Y Li whereas {b"" ab"" dPYb"" ab"" {b"" ab"" d/'Y ^ -^i- 

This shows that Synt(Li) is not in LDA. By Theorem 4.1 we conclude that Li is not F0^[<, +1]- 
definable. Similarly, L2 = F°° \ Li is definable in 112 [<] but not in F0^[<, +1]. o 

Lemma 4.5 Let Lcr°° be definable in F02[<,+1]. Then Synt(L) € LDA. 

Proof: Let L be defined by a F0^[<, +l]-formula of quantifier depth m. Choose e € r+, s, t £ T* 
and n > m such that all n-powers are idempotent in Synt(L). Let e = e". Note that |e| > m, 
i.e., no factor of length at most m can cover the whole factor e. We show in the following that 
{eseteYese{eseteY =L (esete)^". 

Let u, V, w £ F* and a = u{eseteYese{eseteYvw^ and /3 = u{eseteY{e-seteYvw'^ . We identify 
the positions of /3 with a subset of the positions in a in the natural way. Note that in particular 
the successor of the last position in the prefix u{eseteY of (3 is the first position of the suffix 
{eseteYvw^ . We use x, y to designate positions of a and x', y' for positions of /?. 

We define balls Bi around the difference of a and /? in the following way: 

' V 

a = u{eseteY~^~^ esete {eseteY ese{esetey esete{eseteY~^~^vw^ . 



Therefore, the set of positions of (3 are all positions of a except those that are in Bq. 
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The i-context Xi{z) of a position z on a word 7 is the factor induced by the positions [z — i; z + 1] 
(which may be shorter than 2i+l if z hes near the boundary of 7). We say that a tuple (x, y, x', y') 
is i-legal if 

x',y' ^ Bo, 

X = y iff x' = y', 

X = y lb 1 iff x' = y' =b 1, 

X < y iff x' < y', 

Xm-i{x) = Xm-i{x') and Am_j(y) = Xm~i{y')- 

The idea is that x' , y' are positions in f3 and the configuration cannot be distinguished by atomic 
formulas and checking the contexts of the positions up to width m — i. We say that (x,y,x',y') 
is i-close if it is i-legal and 

X / x' =^ X, x' G Bi, 

y^y' y,y' b^, 

that is, in addition to being legal, the respective positions either are the same or if they are not 
the same then they are both "not to far" from Bq. So if either z or z' is not in Bi we can deduce 
that z = z' . 

At the beginning we have x = y = x' = y' are the first position in a and f3 and this configuration 
is 0-close since x' and y' cannot be in Bq because e is longer than m. 

Now we claim that if (x, y, x', y') is i-close and ip{x, y) G F0^[<, +1] has quantifier depth < m—i 
than 

a, a;, y N ^{x, y) P, x', y' \= Lp{x\ y'). 

For i = n this is immediate due to the fact that the situation is 0-legal, all atomic formulas agree 
on their value on a and (3. Let now i < n. We may assume without loss that (/^(x, y) = 3x : ^l^{x, y). 
Let a,x,y \= Lp{x,y). Then there is x such that ip{x,y) is true on a. First consider the case 
X = y. We set x' = y' and see that (x,y,x',y') is {i + l)-close. Note that here we use that e 
is long enough so that a context "near the middle" cannot extend into the s in Bq. Hence by 
induction il){x' ,y') is true on /3 and therefore /3,x',y' \= ip{x',y'). 

Consider now the case x = y ib 1. Then we set x' = y' ib 1. This situation is again (i + l)-close 
(here we use that in f3 the successor of the last position before Bq is the first position after Bq). 

Now consider x + l<y. If x ^ Bi then we set x' = x. In this situation we have x' + 1 < y'. 
Moreover we have equality only if y' is the first position in Bi. Hence, by choice of e, we find a 
position x' in -Bi+i with the same m — {i + l)-context such that x' + 1 < y' and we obtain an 
{i + l)-close situation for both cases. If x + 1 < y and x & Bi we choose x' G -B^+i \ Bi to the 
first position with the same m — + context. This is possible, again by choice of e. We handle 
X > y + 1 similarly. We showed that starting with an i-close situation, we always can assure an 
(i + l)-close situation for ^l^{x, y) with quantifier depth < m — i — 1. By induction this shows that 
a,x,y \= (p{x,y) implies f3,x',y' \= ip{x',y'). The reverse implication is obtained by a symmetric 
argumentation. 

Taking i = in the claim above the first requirement in (1) follows. By similar arguments 
we see that formulas of quantifier depth at most m agree on the words u[{esete)"'ese{esete)"'v^^ 
and u[{esete)"-{esete)^v)'^ . This shows that /il : F* — ^ Synt(L) is in LDA. □ 

Lemma 4.6 If L CI T°° is recognized by h : T* M in LDA, then L is clopen in the strict 
k-factor topology for every k >2 \M\. 
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Proof: Since T°° \L is also recognized by h, it suffices to show that L is open. Let a G [s] [e]^ C L 
for some linked pair (s, e) € and let ^ = imk{a) / 0. We write a = 596162 • • • with h{so) = s, 
h{ei) = e, and 6162 • • • € vl®. Moreover, we can assume |6i| > A; and afc(6j) = A for each i > 1. 
Let ri be the prefix of ei of length k — I. We have a G sq^^i ° ^® n A®". 

We show that son o A® D yl® C L which proves the claim. Let /3 G sori o ^® n A® and 
write /3 = sorir2fif2 ■ ■ ■ such that / = = ^(/2) = ■ ■ ■ and (/i(rir2), /) is a linked pair with 

alphj.(/j) = A for all i > 1. Let r = h{rir2). 

We factorize rir2/i = xo^i • • • such that < |M| and for each Xi, i < m there exists an 
idempotent gi^i G ^(L"*") C M with h{xi)gij^i = h{xi). By construction of A; and ri we see that 
xq is a prefix of ri. Hence, 

6 <7e /i(n) <n h{xo) = h{xo)gi. 

By choice of A and 61, we see that for all < z < m, the word factor of 61. Hence, 

for all 1 < i < m we have 

6 <j h{xi-iXi) = h{xi-i)gi h{xi) gi+i <j gi h{xi) gi+i. 

factor of 61, there exists to ^ T* such that Xm-iXmto is a suffix of 61. With 
t = h{to) we see that 

6 <£ h{Xm-lXm)t = h{Xm~i)gmh{Xm)t <£ gmh{Xm)t. 

By Proposition 4.3 we see that 

6 = 6/i(xo)gi/i(xi) • • • gmh{xm)te = eh{rir2fi)te = erfte 

Similarly, using alph^(/j) = A, we obtain p,q G M with / = fpeqf. Since M is aperiodic, there 
exists n G N such that = a^~^^ for all a G M. It follows 

6 = er fpeqf te = {erfp^ e {qfteY = {erfpY^^ e {qfte)"- = erfpe 

and similarly, 

/ = fperfteqf = {fperT f {teqfT = {fperT^^ f {teqfT = fperf. 

We have s = se = serfpe = srfpe and therefore, [s][6]'^ = [srfpe][erfpe]'^ C L. By strong recog- 
nition and since [srfpe][erfpe]^r) [srf][fperf]^ 7^ 0, we conclude f3 G [sJ'li/]'^ = [srf][fperf]'^ C 
L. This shows that every infinite word in L has an open environment contained in L. Every 
finite word w has a trivial open environment {w}. Therefore L is open. □ 

Lemma 4.7 If L CI T°° is weakly recognized by h : T* ^ M in LDA and if L is closed in the 
strict k-factor topology for some k > 2 \M\, then L is definable in F0^[<, +1]. 

Proof: Let a G L and A = imfc(Q). We can assume that A = {wi, . . . ,Ws} ^ $ because LnT* is 
definable in FO^ [<,+!], see e.g. [14]. Write a = u ■ w ■ /S with w alph^.(lastfc_i(w) • /3) and w is 
the last factor in a which occurs only finitely often. If all factors occur infinitely often, then we 
set a = f3. In the remainder, we assume that some factor appears finitely often; the other case is 
similar. Let r be the Ramsey number for monochromatic triangles when using |M| colors. We 
consider the following factorization of (3: 

j3 = U1V1U2 ■ ■ ■ UrsVrsl 
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where Vi+i = mod s)+i ^-i^d Vi ^ alph^(nj firstfc_i(t;j)), i.e., fj+i is always the first occurrence 
of this factor after Vi and we iterate seeing all factors in A for r-many times. We write Ui for 
the set of words in fi [{A \ {fj})® U T^'^) which do not end with some non-empty suffix x, 

\x\ < k, such that Vi is a prefix of xvi, i.e., no word in Ui ■ firstfc_i(t;j) admits Vi as a factor. We 
define 

P{a) = [h{u)] ■ W ■ (a® nUiVi ■■■ Urs Vrs o A®) H A® 

We have a € -P(a). The remainder of the proof is divided into two parts. First, we show 
P{a) C L and second, we show that P{a) is definable by some formula (fa € F0^[<, +1] where 
the size of (fa only depends on M and k, but not on a. 

By choice of r, there exists a £ M and an idempotent e € M such that every word a' € P{a) 
(including a itself) admits a factorization a' = u' ■ w ■ x'e'^e2/3' with h{u') = h{u), h{x') = a, 
h{ei) = h{e'2) = e, alphjrj(e'^) = alphjrj(e2) = alphj^{x'e[e2(3') = A = \mk{x' e'le^l^')- For a we use 
the fixed factorization a = u-w ■ xeie2/3" . Let now a' = u' -w ■ x't'^e^P' € P{oi) be some arbitrary 
word in P{a). We want to show that a' £ L = L. 

Let z' be a finite prefix of Let z the suffix of e'2z' of length k. By construction z is a 
factor of ei, i.e., ei = yizy2 for some 2/1,2/2 € F*. Now, x'e[e'2z' ■ 2/262/3" G vl® n A®. We 
claim that u' ■ w ■ x'e'ie'2z' ■ 2/262/?" G To this end, it suffices to show /i(62 2;' 2/2 62) = 6. 
We factorize z'2/2 = 2:0 ■ ■ ■ a^m with < \xi\ < \M\ such that for every i > the exists an 
idempotent fi € h{T^) such that h[xi-i) = h[xi-i) fi. By construction and since k > 2 \M\ 
we have e <n /i(xo)/i, 6 <£ fmh{xm), e <j fih{xi)fi+i (cf. proof of Lemma 4.6). Using 
Proposition 4.3, we conclude h{e2 z' y2 62) = e. 

Now, we show that P{a) is defined by some formula in F0^[<, +1]. In order to provide a concise 
notation, we introduce macros A(x) = w for a finite word w expressing that the factor w starts 
at position x; X{x) e ^ for a finite collection of finite words A as a shortcut for VnGA -^(^) ~ 
and finally y > x + n and y < x + n for n G N with the natural interpretation. First, we verify 
that we see the sequence of Uj's after the last factor w and that after this last w we do not have 
factors of length k which are not in A. This is done by the formula 

3x: X{x) =w A\fy > x: X{y) e A A 3y > x + k: ui{y) (3) 

with i'i{x) G F0^[<,-|-1] expressing that the suffix starting at x is in T~^ViT*Vi^i ■ ■ ■T*Vrs^°° ■ 
This is achieved by the inductive construction Uiix) = 3y > x: X{y) = Vi A3x > y + k: i/j_(_i(x) 
for i < rs and fj(x) = T else. 

By the finite case [14], we see that [h{u)] and every language [h{ui)] is definable in F0^[<, +1] 
and hence so is Ui because we can specify suffixes and words shorter than k explicitely in 
F0^[<,+1]. Let n,fj.i G F02[<,+1] such that [h{u)] = L(/i) and Ui = L{fii). We use a rel- 
ativization technique to restrict the interpretation of fii to the interval /j comprising all positions 
strictly between the Vi-i and Vi (with vq = w for convenience). 

For this we inductively construct formulas Tif{x) and f]^ {x) in F0^[<,+1] 

rif{x) = 3y > x\/x < y: X(x) = Vi^ ^'nf-i 
r]f{x) = 3y:x = y + k + lA -^r]f {y) 

with r]Q{x) = 3y>x: X{y) = w and rj^ {x) = 3y: x = y + k + 1 A -^r]^{y). 

With these prerequisites we now define the relativization {^jJ)■ of a formula tp to the interval 
li by the rule {3x: = 3x: r]^_^ A rjf A Boolean connectives and atomic variables are 

straightforward and the universal quantifier is then given by the equivalence Vx: ip = -i3x: 
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Therefore, we get that P{a) is defined by the conjunction of the formula in (3) and the sentence 
(A')o ^ Aj ™ F0^[<, +1] where we set rj^^ = T for brevity. 

The size of this formula is bounded by a constant depending only on \A\^ and \M\. These 
parameters do not depend on a and therefore, there are only finitely many languages P{a) when 
a varies over L. Now, 

L = IJ P{a) 

is a finite union. Hence, L is definable in F0^[<, +1]. □ 

We are now ready to prove Theorem 4.1. 

Proof (Theorem 4-1) ■ "1 ^ 3" is Lemma 4.5. The implication "3 2" follows with Lemma 4.6; 
and "2 =^ 1" is Lemma 4.7. □ 



5 The first-order fragment A2 

Over finite words, the fragments F0^[<, +1] and A2[<, +1] have the same expressive power [14, 
26]. This is not true for infinite words. Here, it turns out that A2[<,+1] is a strict subclass of 
F02[<,+1] and that the A2[<,+l]-lan guages are exactly the clopen languages in F02[<,+1]. 

Theorem 5.1 Let L C r°° he a language. The following are equivalent: 

(1) L is IS.2[< definable. 

(2) L is F0^[<, +l]-de/inaWe and clopen in the factor topology. 

Proof: "1 2": Since L is definable in A2[<,+1], we get that it is open by Theorem 3.1 and 
that it is closed by Theorem 3.4. Moreover, we get by these theorems that Synt(-L) is locally 
path-top as well as locally path-bottom. Proposition 4.3 yields Synt(L) € LDA and Theorem 4.1 
shows L G F02[<,+1]. 

"2 1": By Theorem 4.1 Synt(L) G LDA and by Proposition 4.3 Synt(L) is locally path-top 
as well as locally path-bottom. By Theorems 3.1 and 3.4 we get that L is definable in S2[<) +1] 
andn2[<,+l]. □ 

A consequence of Theorem 5.1 is that A2[<,+1] is a strict subclass of F0^[<,+1]. In fact, it 
is a strict subclass of the intersection for the fragments F0^[<, +1] and S2[<, +1]. 

Corollary 5.2 OverV^ , the fragment A2[<, +1] is a strict subclass of the fragment F0^[<, +l]n 
S2[<,+1] and also of the fragment F0'^[<,+l]riU2[<,+l]. 

Proof: The set of non-empty finite words F"^ is defined by the sentence 

3xVy: y < x 

in F0^[<] n S2[<]. We have to show that F^ is not definable in n2[<,+l]. By Theorem 3.4 it 
suffices to show that F+ is not factor closed. Let a € F, and consider the word a = ^ F+. Every 
factor open set containing a also contains some finite word o™" G F"*". Hence, the complement of 
F"*" is not factor open, and therefore, F"*" is not factor closed. By complementation, we see that 
F"^ is definable in F0'^[<] n n2[<] but not in A2[<, +1]. □ 
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Example 5.3 We consider another language which is definable in F0^[<] n S2[<] but not in 
A2[<,+1]. Let r = {a,b} and Lg = r*a6°°. The language L3 is F0^[<] n S2[<]-definable: 

3xVy:A(x) = a A (A(y) = a => y<x). 

In order to show that L3 is not definable in n2[<,+l], it suffices to show that L3 is not factor 
closed (Theorem 3.4). Let A: € N. Every open set containing the word (b^a)'^ ^ L3 also contains 
some word {b^a)^b^ E L3. Hence, the complement of L3 is not fc-factor open, and therefore, 
there is no k such that L3 is closed in the A;-factor topology. 

The same reasoning also works over T"^, since the language of all infinite words is definable in 
n2[<,+l]. Hence, Lg = T*ab'^ is definable in S2[<] over infinite words and in F0^[<] but not 
in A2[<,+1] over infinite words. The language L3 is the standard example of a language which 
cannot be recognized by a deterministic Biichi automaton [28, Example 4.2]. In particular, none 
of the fragments F0^[<,+1] or S2[<,+1] contains only deterministic languages. o 

Example 5.4 Let F = {a, 6, c} and consider the language L4 = (r^\{66})®oaao(r^)® consisting 
of all words such that there is no factor bb before the first factor aa. The language -L4 is defined 
by the E2[<, +l]-sentence 

3xVi/ < x: X{x) = aa A X{y) ^ bb. 

Here, A(x) = w is a shortcut saying that a factor w starts at position x. A word a is in L4 if and 
only if aa is a factor of a and for every factor bb there is a factor aa to the left. These properties 
are n2[<, +l]-definable and hence L4 £ A2[<,+1]. The language L4 is not definable in any of 
the fragments F0^[<], S2[<], or n2[<] without successor, since its syntactic monoid is neither 
locally top nor locally bottom, cf. [7]. The language L4 fl F* has been used as an example of a 
language not definable in the Boolean closure of S2[<] over finite words by Almeida and Klima [2, 
Proposition 6.1] as well as by Lodaya, Pandya, and Shah [14, Theorem 4]. The Boolean closure 
of S2[<] over finite words coincides with the second level of the Straubing-Therien hierarchy, 
cf. [18, 27]. o 

6 The first-order fragments FO^ n S2 and FO^ n 112 

In this section, we show that topological concepts can not only be used as an ingredient for char- 
acterizing first-order fragments, but also for describing some relations between fragments. More 
precisely, we relate languages definable in both $]2[<,+l] and F0^[<,-|-1] with the interiors of 
F0^[<, +l]-languages with respect to the factor topology. Dually, the languages in the fragment 
F0^[<, +1] nn2[<, -|-1] are precisely the topological closures of F0^[<, +l]-languages. Remember 
that for a language L, its closure L is the intersection of all closed sets containing L. It can be 
"computed" as 

I = {a G F~ I V[7 C F°° open with a e U : U n L ^ ^} . 

The interior of L is the union of all open sets contained in L. The interior of a language is the 
complement of the closure of its complement. 

Theorem 6.1 Let L C F°° be a regular language. The following are equivalent: 

(1) L G F02[<,+1] nS2[<,+l]. 

(2) L G F0^[<,+1] and L is open in the factor topology. 

(3) L is the factor interior of some F0'^[<, +l]-definable language. 
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Proof: By complementation, the proof follows from Theorem 6.2 below. □ 

The equivalence of (1) and (2) is an immediate consequence of Theorems 3.1 and 4.1. The 
surprising property is (3); for example, it is not obvious that the factor interior of anF02[<,+l]- 
definable language is again in FO^ [<,+!]. It is slightly easier to first proof Theorem 6.2 — and 
then conclude Theorem 6.1 — than the other way round. The reason is that "computing" the 
closure is slightly easier than "computing" the interior. 

Theorem 6.2 Let L C T°° be a regular language. The following are equivalent: 

(1) L € F02[<,+i] nn2[<,+i]. 

(2) L € F0^[<,+1] and L is closed in the factor topology. 

(3) L is the factor closure of some F0'^[<, +l]-definable language. 

Proof: "1 2": If L is in n2[<, +1], then by Theorem 3.4, the language L is factor closed. 
"2 ^ 3": If L is closed, then L = L. 

"3 =^ 1": By Theorem 4.1 and Theorem 3.4 it suffices to show that Synt(L) is in LDA. The 
factor interior of a regular language is regular. More precisely, a Biichi automaton recognizing the 
interior is effectively computable by Lemma 2.4. Since Biichi automata are effectively closed under 
complementation, the language L is regular. Let L be fc-factor closed and let n > |Synt(L)| + 
|Synt(L)| + k, let p € r+, and let q,r e T*. We set 

u = (p" qp^rp^Y p^qp^ (p" qp^rp^Y, 
V = (p" qp"- rp^)"' 

We have 

xuyz^ £ L <^ xvyz'^ G L 

for all x,y,z G F*. By left-right symmetry, it suffices to show the implication from left to right. 
Let s be a finite prefix of z'^ . Since xuyz"^ G L there exists /3 G alph^(2;'^)®' with xuys o /3 G L. 
Then, since Synt(L) G LDA, we have xvys o /3 G L. Hence, xvyz'^ G L. Moreover 

x{uyY eL 4^ x{vyY G L 

for all x^y G F*. Again by left-right symmetry, it suffices to show the implication from left 
to right. Let m > 1 and consider the prefix {vy)^ of {vyY . Since x{uyY S L there exists 
j3 G alph^((uy)'^) with x{uy)^ o /3 £ L. By choice of n, we have alph^((uy)'^) = alph^((uy)'^) 
and last/c_i((M?/)™') = lastfc_i((f y)™). Since Synt(L) G LDA, we have x{vy)'^ o /3 G -L. Hence, 
x{vyY G L. □ 



7 Summary 

We considered fragments of first-order logic over finite and infinite words. As binary predicates 
we allow order comparison x < y and the successor predicate x = y+1. Figure 1 depicts the 
relation between the fragments S2[<,+1], n2[<,+l], and F0^[<,+1]. Moreover, the languages 
Li, L2, L3, and L4 from Examples 4.4, 5.3, and 5.4 are included. For the other languages, we fix 
F = {a, b, c} and / yl C F. 

The central notion for presenting our results is a partially defined composition uo^v = u'xv' 
where u = u'x, v = xv' , and |x| = k — 1. Using this composition, one can show that the languages 
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Figure 1: The fragments S2[<,+1], n2[<,+l], and F0^[<,+1] over P 



definable in S2 [<,+!] is exactly the class of factor polynomials. Moreover, the composition 
leads to the fe-factor topology, which we use in further characterizations of the successor 
fragments. A set is factor open if there exists some number k such that L is fc-factor open. For 
every regular language L, Proposition 2.2 gives a bound k such that L is factor open if and only 
if L is fc-factor open. Then, in Proposition 2.3, we essentially show that for a given number k it 
is decidable whether a regular language L is /c-factor open. Altogether, in order to check whether 
L is factor open, we can check whether L is /c-factor open, with k being the bound given by 
Proposition 2.2. Hence, the topological properties, which we use in the characterizations of the 
fragments, are decidable. Together with the decidable algebraic properties, this gives a decision 
procedure for deciding whether a given regular language L C T°° or L C F'^ is definable in one 
of the fragments under consideration. In Table 1 we summarize our main results. All fragments 
are using binary predicates [<,+!]. The first decidable characterization of F0^[<, +1] is due to 
Wilke [31]. Decidability for S2[<,+1] over infinite words is new (Corollary 3.3). 



Logic 


Languages 


Algebra 


+ Topology 






factor polynomials 


eP(,e < e 


+ factor open 


Thm. 3.1 


Ha 




ePf,e > e 


+ factor closed 


Thm. 3.4 






LDA 

weak LDA 


+ strictly factor closed 


Thm. 4.1 


A2 




LDA 


+ factor clopen 


Thm. 5.1 


FO^ n S2 


factor interior of FO^ 


LDA 


+ factor open 


Thm. 6.1 


FO^ n Ha 


factor closure of FO^ 


LDA 


+ factor closed 


Thm. 6.2 



Table 1: Main characterizations of some first-order fragments 



Open problems The fragment S2[<, +1] has a language description in terms of factor polynomi- 
als. Without the successor predicate similar characterizations in terms of so-called unambiguous 
polynomials exist for the fragments F0^[<], for F0^[<] D S2[<], and for A2[<], cf. [7]. It is open 
whether these fragments admit similar characterizations if we allow the successor predicate. 

Moreover, for the fragment A2[<,+1] we only have an implicit decidable characterization 
based on the decidability of S2[<i +1] and 112 [<, +1] (or alternatively, based on the decidability 
of F0^[<, +1] and being clopen). A more direct characterization of this fragment remains open. 
For A2[<] without successor, such a characterization shows that all languages in A2[<] over 
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infinite words are recognizable by deterministic Biichi automata. 

Another important fragment is BSi, the Boolean closure of Si. A result of Knast [12] shows 
that, over finite words, it is decidable whether a regular language is definable in the logic 
]BSi[<, +1, min, max], which over finite words corresponds to the first level of the dot-depth 
hierarchy. A similar result over infinite words is still missing. 
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